Overview

Dataset statistics

Number of variables11
Number of observations690
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory59.4 KiB
Average record size in memory88.2 B

Variable types

NUM10
BOOL1

Reproduction

Analysis started2020-07-14 09:30:50.591758
Analysis finished2020-07-14 09:32:01.020959
Duration1 minute and 10.43 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Uniformity of Cell Shape is highly correlated with Uniformity of Cell SizeHigh correlation
Uniformity of Cell Size is highly correlated with Uniformity of Cell ShapeHigh correlation
df_index has unique values Unique

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct count690
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean348.9869565217391
Minimum0
Maximum698
Zeros1
Zeros (%)0.1%
Memory size5.4 KiB

Quantile statistics

Minimum0
5-th percentile34.45
Q1172.25
median351.5
Q3523.75
95-th percentile662.55
Maximum698
Range698
Interquartile range (IQR)351.5

Descriptive statistics

Standard deviation202.4902975
Coefficient of variation (CV)0.5802231107
Kurtosis-1.209901392
Mean348.9869565
Median Absolute Deviation (MAD)176
Skewness-0.00705511275
Sum240801
Variance41002.32058
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
69810.1%
 
22710.1%
 
23510.1%
 
23410.1%
 
23310.1%
 
23210.1%
 
23110.1%
 
23010.1%
 
22910.1%
 
22810.1%
 
Other values (680)68098.6%
 
ValueCountFrequency (%) 
010.1%
 
110.1%
 
210.1%
 
310.1%
 
410.1%
 
ValueCountFrequency (%) 
69810.1%
 
69710.1%
 
69610.1%
 
69510.1%
 
69410.1%
 

Clump Thickness
Real number (ℝ≥0)

Distinct count10
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.428985507246376
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.4 KiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q36
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.817378242
Coefficient of variation (CV)0.6361227052
Kurtosis-0.6277754309
Mean4.428985507
Median Absolute Deviation (MAD)2
Skewness0.5893875385
Sum3056
Variance7.937620159
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
114220.6%
 
512918.7%
 
310515.2%
 
48011.6%
 
106910.0%
 
2507.2%
 
8466.7%
 
6334.8%
 
7233.3%
 
9131.9%
 
ValueCountFrequency (%) 
114220.6%
 
2507.2%
 
310515.2%
 
48011.6%
 
512918.7%
 
ValueCountFrequency (%) 
106910.0%
 
9131.9%
 
8466.7%
 
7233.3%
 
6334.8%
 

Uniformity of Cell Size
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count10
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.1333333333333333
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.4 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q35
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)4

Descriptive statistics

Standard deviation3.042450824
Coefficient of variation (CV)0.9709949438
Kurtosis0.1009475704
Mean3.133333333
Median Absolute Deviation (MAD)0
Skewness1.231116626
Sum2162
Variance9.256507015
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
137854.8%
 
10659.4%
 
3517.4%
 
2456.5%
 
4405.8%
 
5304.3%
 
8294.2%
 
6273.9%
 
7192.8%
 
960.9%
 
ValueCountFrequency (%) 
137854.8%
 
2456.5%
 
3517.4%
 
4405.8%
 
5304.3%
 
ValueCountFrequency (%) 
10659.4%
 
960.9%
 
8294.2%
 
7192.8%
 
6273.9%
 

Uniformity of Cell Shape
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count10
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.2043478260869565
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.4 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q35
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.960844365
Coefficient of variation (CV)0.924008418
Kurtosis0.0134964593
Mean3.204347826
Median Absolute Deviation (MAD)0
Skewness1.161617562
Sum2211
Variance8.766599356
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
134750.3%
 
2598.6%
 
10568.1%
 
3568.1%
 
4446.4%
 
5334.8%
 
7304.3%
 
6304.3%
 
8284.1%
 
971.0%
 
ValueCountFrequency (%) 
134750.3%
 
2598.6%
 
3568.1%
 
4446.4%
 
5334.8%
 
ValueCountFrequency (%) 
10568.1%
 
971.0%
 
8284.1%
 
7304.3%
 
6304.3%
 

Marginal Adhesion
Real number (ℝ≥0)

Distinct count10
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.827536231884058
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.4 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q34
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.867787365
Coefficient of variation (CV)1.014235408
Kurtosis0.9251918611
Mean2.827536232
Median Absolute Deviation (MAD)0
Skewness1.505406523
Sum1951
Variance8.224204371
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
140058.0%
 
3588.4%
 
2568.1%
 
10558.0%
 
4334.8%
 
8253.6%
 
5233.3%
 
6223.2%
 
7131.9%
 
950.7%
 
ValueCountFrequency (%) 
140058.0%
 
2568.1%
 
3588.4%
 
4334.8%
 
5233.3%
 
ValueCountFrequency (%) 
10558.0%
 
950.7%
 
8253.6%
 
7131.9%
 
6223.2%
 

Single Epithelial Cell Size
Real number (ℝ≥0)

Distinct count10
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.2130434782608694
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.4 KiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median2
Q34
95-th percentile8
Maximum10
Range9
Interquartile range (IQR)2

Descriptive statistics

Standard deviation2.200963837
Coefficient of variation (CV)0.6850090427
Kurtosis2.206547554
Mean3.213043478
Median Absolute Deviation (MAD)0
Skewness1.716779913
Sum2217
Variance4.844241812
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
238255.4%
 
37110.3%
 
4487.0%
 
1456.5%
 
6415.9%
 
5395.7%
 
10304.3%
 
8202.9%
 
7121.7%
 
920.3%
 
ValueCountFrequency (%) 
1456.5%
 
238255.4%
 
37110.3%
 
4487.0%
 
5395.7%
 
ValueCountFrequency (%) 
10304.3%
 
920.3%
 
8202.9%
 
7121.7%
 
6415.9%
 

Bare Nuclei
Real number (ℝ≥0)

Distinct count10
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.482608695652174
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.4 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q35
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)4

Descriptive statistics

Standard deviation3.617063918
Coefficient of variation (CV)1.038607617
Kurtosis-0.7159815407
Mean3.482608696
Median Absolute Deviation (MAD)0
Skewness1.029145879
Sum2403
Variance13.08315139
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
141259.7%
 
1013018.8%
 
5304.3%
 
2304.3%
 
3284.1%
 
8202.9%
 
4192.8%
 
991.3%
 
781.2%
 
640.6%
 
ValueCountFrequency (%) 
141259.7%
 
2304.3%
 
3284.1%
 
4192.8%
 
5304.3%
 
ValueCountFrequency (%) 
1013018.8%
 
991.3%
 
8202.9%
 
781.2%
 
640.6%
 

Bland Chromatin
Real number (ℝ≥0)

Distinct count10
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.436231884057971
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.4 KiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q35
95-th percentile8
Maximum10
Range9
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.444060424
Coefficient of variation (CV)0.711261785
Kurtosis0.1844236965
Mean3.436231884
Median Absolute Deviation (MAD)1
Skewness1.101265768
Sum2371
Variance5.973431354
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
216523.9%
 
316023.2%
 
115121.9%
 
77110.3%
 
4405.8%
 
5344.9%
 
8284.1%
 
10202.9%
 
9111.6%
 
6101.4%
 
ValueCountFrequency (%) 
115121.9%
 
216523.9%
 
316023.2%
 
4405.8%
 
5344.9%
 
ValueCountFrequency (%) 
10202.9%
 
9111.6%
 
8284.1%
 
77110.3%
 
6101.4%
 

Normal Nucleoli
Real number (ℝ≥0)

Distinct count10
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.8855072463768114
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.4 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q34
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)3

Descriptive statistics

Standard deviation3.067682349
Coefficient of variation (CV)1.063134516
Kurtosis0.4199460001
Mean2.885507246
Median Absolute Deviation (MAD)0
Skewness1.405286567
Sum1991
Variance9.410674996
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
143663.2%
 
10618.8%
 
3426.1%
 
2365.2%
 
8243.5%
 
6223.2%
 
5192.8%
 
4182.6%
 
9162.3%
 
7162.3%
 
ValueCountFrequency (%) 
143663.2%
 
2365.2%
 
3426.1%
 
4182.6%
 
5192.8%
 
ValueCountFrequency (%) 
10618.8%
 
9162.3%
 
8243.5%
 
7162.3%
 
6223.2%
 

Mitosesi
Real number (ℝ≥0)

Distinct count9
Unique (%)1.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.5942028985507246
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.4 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile5
Maximum10
Range9
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.724230466
Coefficient of variation (CV)1.081562747
Kurtosis12.4893058
Mean1.594202899
Median Absolute Deviation (MAD)0
Skewness3.541474105
Sum1100
Variance2.972970699
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
157182.8%
 
2355.1%
 
3324.6%
 
10142.0%
 
4121.7%
 
791.3%
 
881.2%
 
560.9%
 
630.4%
 
ValueCountFrequency (%) 
157182.8%
 
2355.1%
 
3324.6%
 
4121.7%
 
560.9%
 
ValueCountFrequency (%) 
10142.0%
 
881.2%
 
791.3%
 
630.4%
 
560.9%
 

Class
Boolean

Distinct count2
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size5.4 KiB
1
452
0
238
ValueCountFrequency (%) 
145265.5%
 
023834.5%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

df_indexClump ThicknessUniformity of Cell SizeUniformity of Cell ShapeMarginal AdhesionSingle Epithelial Cell SizeBare NucleiBland ChromatinNormal NucleoliMitosesiClass
005111213111
1154457103211
223111223111
336881343711
444113213111
558101087109710
6611112103111
772121213111
882111211151
994211212111

Last rows

df_indexClump ThicknessUniformity of Cell SizeUniformity of Cell ShapeMarginal AdhesionSingle Epithelial Cell SizeBare NucleiBland ChromatinNormal NucleoliMitosesiClass
6806891111211181
6816901113211111
682691510105454410
6836923111211111
6846933111212121
6856943111321111
6866952111211111
6876965101037381020
68869748643410610
68969848854510410